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Abstract 



Question answering task is now being done in TREC8 using English documents. We 
examined question answering task in Japanese sentences. Our method selects the answer 
J ' by matching the question sentence with knowledge-based data written in natural language. 

U ■ We use syntactic information to obtain highly accurate answers. 



1 Introduction 



> 

\Q \ Question answering task has been done in TREC8 using English documents (lj. Here, we 
examine question answering task in Japanese sentences! '□. Our approach is to use syntactic 
information. 



2 Question Answering System 
2.1 Outline 

1. The system detects keywords in question sentences, and then detects sentences in which 
the sum of the keywords' IDF values is high!. 

2. The question sentences and the detected sentences are parsed by the Japanese syntactic 
analyzer ||. (This allows us to obtain the dependency structures.) 

3. The answer is selected by matching a question sentence and the detected sentences using 
syntactic information. How this is done is described in the next section. 



1 With respect to Japanese sentences, domain-dependent work such as that on dialogue systems and help 
systems has been done Q [||, but little work has been done on detecting the answer from natural-language 
databases as in question answering task. However, much has been done on English sentences, such as work on 
detecting sentences in written answers || to work on detecting answers themselves S . 

2 This paper outlines one part of a question answering system that we have been developing for a long time 

i §• 

•Tn this paper, the system detects one sentence by one sentence. However, it would be better to detect a 
series of sentences and detect the answer from a series of sentences. If a series of sentences is used to detect the 
answer, context information can be used. 
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2.2 Matching a Question and Detected Sentences Using Syntactic 
Information 



We use the syntactic information when matching a question sentence and the detected sentences. 
The score of a detected sentence s is as follows. 

Score(s) = Bl(s) + a* B2(s) -p* DNUM(s) (1) 

Bl{s) = BNSTl(b) (2) 

all bunsetsus b in the question sentence 

B2(s) = BNST2(bl,b2) (3) 

all pairs of two bunsetsus (bl, b2) in 
the question sentence, where bl de- 
pends on b2 (i.e. b2 is the head of bl.) 

Each of the bunsetsus in the question sentence can be paired with one of the bunsetsus in 
the sentence s in order to maximize the value of Score(s). (A bunsetsu in Japanese corresponds 
to a phrasal unit such as a noun phrase or a prepositional phrase in English.) BNSTl(b) is 
the similarity between the bunsetsu b in the question sentence and the bunsetsu in the detected 
sentence s paired with the bunsetsu b. BNST2(b) is the similarity between the set of the two 
bunsetsus, (61, b2), and the set of the two bunsetsus in the detected sentence s paired with bl 
and 62. DNUM(s) is the number of the bunsetsus of the sentence s. a and (5 are constants, and 
are set by experiment. (Although we use only monomial and binomial syntactic information in 
Eq. [l|, we can also use trinomial or polynomial syntactic information.) 

We calculate the similarity between two words by using the EDR dictionaries @i. In the 
case of the bunsetsu containing an interrogative pronoun, the similarity is calculated according 
to the situations. For example, when a bunsetsu in the question sentence is "where" and the 
paired bunsetsu in the sentence s has the meaning of locationi, the similarity between them is 
set to high. 

Our system performs the above matching process and selects the answer from the sentence 
having the highest score. The answer is selected by considering a bunsetsu paired with a 
bunsetsu containing an interrogative pronoun as the desired answer. 

In general, the answer of the question sentence can be obtained by matching the question 
sentence and the database sentences. In the case of YES-NO questions, the system has only to 
match the question sentence and the database sentence, and outputs YES if matched (or NO 
otherwise). In the case of fill-in-the-blanks-type questions!, the system has only to consider 

4 The similarity between words can be handled by using thesauri. But the similarity between long expressions 
such as clauses is difficult to handle. To solve this problem, we have already considered the method of using 
rewriting rules PQ] . This method will be described in later papers. 

5 Specifying bunsetsus whose meanings arc locations is done by using thesauri such as the EDR dictionaries. 

6 The process of solving fill-in-the-blanks-type questions can be considered as a case of ellipsis resolution if 
the blanks are considered as ellipses. We have already discussed how corpora can be used in ellipsis resolution 
[pTj. So we should be able to use corpora to fill blanks in the fill-in-the-blanks-type questions. 
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the element of the database sentence, paired with an interrogative pronoun such as "What" 
as the desired answer. Our approach here is an implementation of this idea using syntactic 
information. 

3 Example 

This section shows three examples of when our system obtained correct answers. 

We used as question sentences the English-to- Japanese translations of sample sentences in 
TREC8. We used as database sentences the Daijirin Japanese word dictionary and the Mainichi 
Japanese newspaper (1991-1998). When we use the Daijirin dictionary, we added the strings 
"entry word + wa (topic-marking functional word)" to the beginning of each sentence. 

First, we inputted the following Japanese sentence into our system. 

Uganda no shuto wa doko desu ka. 
(Uganda) (of) (capital) topic (where) (be) (?) 
(What is the capital of Uganda?) 

As a result of calculating the score of Eq. [l|, the following sentence in the Daijirin dictionary 
had the highest score and "Kampala" was correctly selected. 

kanpara wa Uganda kyouwakoku no shuto desu. 
(Kampala) topic (Uganda) (republic) (of) (capital) (be) 
(Kampala is the capital of the Uganda republic.) 

The score is calculated in the following. 

Score = 9.7(Matching between "Uganda" and "Uganda republic") (4) 
+ 5.9(Matching between "capital" and "capital") 

+ 1.6(Matching between "capital of Uganda" and "capital of Uganda republic") 
= 17.2 

Next, we inputted the following Japanese sentence into our system. 

magunakaruta ga tyouin sareta no-wa nan nen desu ka. 

(Magna Carta) subject (sign) passive topic (what) (year) (be) (?) 
(What year was the Magna Carta signed?) 

As a result of calculating the score of Eq. [I], the following sentence in the Daijirin dictionary 
had the highest score and "1215" was correctly selected. 

magunakaruta wa 1215 nen igirisu no houken shokou ga 
(Magna Carta) topic (1215) (year) (England) (of) (feudal) (lords) subject 

kokuou jon ni semari, ouken no seigen to 

(king) (John) object (press) (royal authority) (of) (limitation) (and) 
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shokou no kenri wo kakunin saseta bunsho. 
(lords) (of) (right) object (confirm) causative (document). 

(Magna Carta is the document in which feudal lords of England made King John 
confirm the limitation of the royal authority and their rights in 1215.) 

The score is calculated in the following. 

Score = 32.0(Matching between nan nen "what year" and 1215 nen "1215 year") (5) 
+ 14.6 (Matching between "Magna Carta" and "Magna Carta") 

= 48.1 

Finally, we inputted the following Japanese sentence into our system. 

paakinson byou wa nou no dono bubun ni-aru saibou no 

(Parkinson) (disease) topic (brain) (of) (what) (area) (in) (cell) (of) 

shi ni kankei-shite-imasu ka. 

(demise) (to) (be linked) (?) 

(The symptoms of Parkinson's disease are linked to the demise of cells in what area 
of the brain?) 

As a result of calculating the score of Eq. [I], the following sentence in the Mainichi newspaper 
had the highest score and "substantia nigra" was correctly selected. 

paakinson byou wa tyuunou no kokushitsu ni-aru meranin 

(Parkinson) (disease) topic (midbrain) (of) (substantia nigra) (in) (melanin) 

saibou ga hensei-shi, kokusitsu saibou nai-de tsukurareru 

(cell) subject (degenerate) (substantia nigra) (cell) (in) (be made) 

shinkei-dentatsu-busshitsu no doupamin ga nakunari hatsubyou-suru, 

(neurotransmitter) (of) (dopamine) subject (run out) (be taken ill) 

to-sarete-iru. 
(be recognized) 

(Parkinson's disease is recognized when melanin cells in the substantia nigras of the midbrain 
degenerate. The neurotransmitter dopamine, which is made in substantia nigra cells, runs 
out, and Parkinson's disease arises.) 

The score was calculated in the following. 

Score = 10.6(Matching between "Parkinson's disease" and "Parkinson's disease") (6) 
+ 6. 3(Matching between "cell" and "melanin cell") 
+ 1.5 (Matching between "brain" and "midbrain") 
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+ 0.4(Matching between "area of brain" and "substantia nigra of midbrain") 
+ 0.3(Matching between "in area" and "in substantia nigra") 



= 32.2 

"cells in interrogative pronoun of brain" and "cells in substantia nigra of midbrain" were 
matched, and "substantia nigra" was correctly selected@. 

4 Conclusion 

We have outlined our question answering system using syntactic information. We intend to run 
more experiments, to make our system more robust. 

We think that the human sentence-reading process involves a matching process between the 



sentence being read now and data recalled in the brain [12]. Our question answering system 
matches question sentences and sentences in its database, and may therefore provide some clues 
to shed light on the human reading process. Future work will involve extending this current 
work to work on the human reading process. 
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